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1 Introduction 

1.1 The problem and previous work 

The quadratic unconstrained binary optimization (QUBO) problem is defined by 


min Qx 
s.t. X € S, 

where, without loss of generality, Q G and S represents the binary discrete set {0,1}^. 

Many NP-hard combinatorial optimization problems arise naturally or can easily be reformulated as 
QUBO problems, such as the quadratic assignment problem, the maximum cut problem, the maximum 
clique problem, the set packing problem, and the graph colouring problem (see, for instance. Boros and 
Prekopa |[T1, Boros and Hammer ||2l, Bourjolly et al. ||3l, Du and Pardalos JT), Pardalos and Rodgers ||5]|^, 
Pardalos and Xue Q, and Kochenberger et al. HD- 

Numerous interesting applications expressed naturally in the form of a QUBO problem have appeared in 
the literature. Barahona et al. ISlIIol formulate and solve the problem of finding exact ground states of spin 
glasses with magnetic fields. Alidaee et al. im study the problem of scheduling n jobs non-preemptively 
on two parallel, identical processors to minimize weighted mean flow time as a QUBO problem. Bomze 
et al. lfT2l give a comprehensive discussion of the maximum clique (MC) problem. Included is the QUBO 
problem representation of the MC problem and a variety of applications from different domains. The QUBO 
problem has been used in the prediction of epileptic seizures 03. Alidaee et al. na discuss the number 
partitioning problem as a QUBO problem. 

To solve a QUBO problem, a number of exact methods have been developed ll5l fT5l432l . However, due 
to the computational complexity of the problems, these approaches have only been able to solve small¬ 
sized instances. To obtain near-optimal solutions, several heuristic and metaheuristic approaches such as 
tabu search EMU, simulated annealing II42II43I . evolutionary algorithms Il38ll444l48l . scatter search 143 . 
GRASP II41II50I . and iterated local search algorithms ED have been proposed. Additionally, preprocessing 
techniques have been developed to simplify the solving of QUBO problems ll52l . 

The majority of existing heuristic algorithms search relatively small neighbourhoods that can be searched 
efficiently. In this paper, we explore the possibility of utilizing a quantum annealer (see Section fL2T l. which 
is able to search a portion of the k-flip neighbourhood of size 2^ in a single operation. We develop a hybrid 
algorithm that incorporates a quantum annealer into a classical algorithm and give experimental results that 
show that this approach could be competitive in the near future. 

While working on this research we noted research on the possibility of solving larger problems by de¬ 
composing them into smaller problems and solving them on the D-Wave machine E3- Although the idea 
of “large-neighbourhood local search” is mentioned in brief in that research, which bears resemblance to the 
basic idea we cover in this paper, it is not developed in detail and no results are reported. 

After the publication of our paper on the arXiv, we became aware of a paper which explores a similar 
idea El- The authors of that work explored the applicability of solution via local updates to two- and 
three-dimensional Ising spin glasses, as well as Chimera-structured problems. Since those problems contain 
only very local couplings, even a simple approach showed success. Our contribution focuses on devising an 
algorithm that is effective for a more general class of problems, sparse or dense, which in general contain 
short- and long-range couplings. 
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1.2 The quantum annealer 

D-Wave Systems has developed a scalable quantum annealer that solves QUBO problem^ albeit with some 
limitations. 

It has been argued that quantum annealing has an advantage over simulated annealing due to quantum 
tunnelling, which allows an optimizer to pass through barriers instead of going over them. This might provide 
a speedup for some problem classes Il55l457ll . Indeed, simulated quantum annealing 058II59I shows a speedup 
for some problem classes ll58l . but not for some others II 6 OII . 

There is strong evidence that the D-Wave machine is quantum MM, but it is yet to be established 
whether it provides a quantum speedup over classical computers. Recently, there has been significant interest 
in benchmarking the D-Wave machines ll6Tl467l . and there is ongoing debate on how to define quantum 
speedup, and on which problems a quantum annealer could be expected to demonstrate it Il68l470l . We 
expect results in the near future to shed light on this important point. 

The connectivity of the D-Wave processor’s qubits is currently given by the Chimera graph GI]. Due to 
the sparsity of this hardware graph, it is often necessary to identify multiple physical qubits with one logical 
qubit in order to increase the connectivity. In particular, it is known that the largest complete graph that 
can be embedded on a Chimera graph of size n is of size + 1 (for instance, for the current-generation 
512-qubit chipH, the largest complete graph is of size 33) II721I73L The parameter misspecihcation error rate 
for the current-generation chip is 3-5%, whereas the next generation of chip, scheduled for release in the 
second half of 2015, will have 1152 active qubits and the error rate reduced by an expected 33% Gl. 

The remainder of this paper is organized as follows. In Sectionl^we describe the algorithm. In Section[3] 
we explain the benchmarking methodology and present computational results. In Sections|4]and|5]we discuss 
the results, the conclusions, and future work. 


2 The General QUBO Problem Solver (GQS) 

2.1 The basic algorithm 

We assume that we have an oracle that optimizes QUBO problems of size k, but our problem has k vari¬ 
ables. How can we use the oracle to solve this larger problem? Our idea, which we refer to as “convergence” 
henceforth, is to iteratively choose k variables out of the N variables, and optimize over those variables while 
keeping the other A^ — k variables fixed, until the value of the objective function does not change for some 
number of steps. This basic idea can be improved upon significantly, by choosing intelligently which vari¬ 
ables to update, for example, or using tabu search to disallow updating of the same variables too often, and 
so on (various improvements are described in Section lZ2l i. One can write a simple algorithm that uses this 
idea to hnd a local minimum for a problem of size N (see Algorithm[T]). 

Each call to the underlying optimizer and the subsequent update will either decrease the objective func¬ 
tion’s value or leave it constant, but is guaranteed not to increase it. This does not mean that the algorithm 
does not hill climb. It does, in fact, do so, but the hill climbing capability is contained within the optimizer 
of size k. This algorithm successfully finds a local minimum; however, we are searching for the global 
minimum, and hence we must improve the algorithm to improve our chances of success. 

* The problem that the machine actually solves is known as an Ising model in the physics community, and can easily be transformed 
into the form x^Qx, which is more common in the scientific computing community. 

^ The chip is current as of mid-July 2015. 






4 


Gili Rosenberg et al. 


Algorithm 1 Basic Algorithm 

1: procedure BasicAlgorithm 

2: x-i— a random N-bit configuration, x € {0,1}*'’ 

3: while not converged do 

4: Choose k variables at random 

5: X <— optimize over k variables, given the fixed N — k variables (using the underlying optimizer) 

6: return x 


2.2 The improved algorithm 

We here describe an improved version of Algorithm [1] which we refer to as the General QUBO Problem 
Solver (GQS). The pseudo-code is presented in Algorithmic] and each of the components is described in 
greater detail below: 

1. Better initial guesses (Section l2.31 

2. Different underlying optimizers (Section l2Al) 

3. Escape strategies (Section l23T l 

4. Choice of variables (Section l276l l 

5. Tabu search for A:-opt (Section l2^ 

6. 1-opt tabu search improvement (Section ICTSl l 

7. Stopping criteria and convergence criterion (Section lZ9] l 

Each of these components is added in a modular way, such that many variations on the algorithm can be 
accommodated within this same framework. The reasoning behind this is that we expect that for different 
classes of problems the best-performing variant of this algorithm will be different. In addition, the mode 
of use is important: the parameters would be different if what we were looking for is a quick solution as 
opposed to a high rate of success. 

We note that the reason to keep track of the one-flip gain^ is that once they are initialized, they can be 
updated very efficiently 075II76I . which allows one to both quickly update the objective function’s value after 
each k-variable update, and to use the one-flip gains to choose the variables to be updated (see Section l2.6.11 ). 


2.3 Better initial guesses 

We use the greedy heuristic given by Merz and Katayama ifTSll to find good (i.e. with low value and diverse) 
starting configurations quickly. Eirst, all variables are initialized to 0.5, then repeatedly flipped to either 0 or 
1 depending on which update provides the largest one-flip gain in the objective function’s value, and then 
updating the one-flip gains and repeating, while updating each bit only once. The bits are initialized to 0.5 
despite being binary (that is, having values of 0 or 1) since the idea is not to bias the configurations initially 
such that any of the bits will be more likely to be 0 or 1 (by the end of the process all of the bits are set to 
either 0 or 1). 

When initializing the reference set, we experimented with using a deterministic version of this greedy 
heuristic for the first element in the reference set (see Algorithm|4]i. In order to obtain a diverse reference set, 
for the remaining elements we used a randomized version, where a random bit is flipped randomly, and at 

^ A one-flip gain is the change in the objective function’s value given a flip of a single bit. We refer to the collection of all possible 
single flips and their corresponding change in the objection function’s value as the “one-flip gains”. 
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Algorithm 2 Improved Algorithm 
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procedure ImprovedAlgorithm 
VjX-f— InitialGuessO 
tininAmin ^ 

C<-{1. N) 

T <- {} 

gains <— Calculate change in objective function value for flipping each bit 
while not Done() do 

ChosenVariables VariableChoice(C, k) 

X <— Optimize over the k ChosenVariables, given the fixed N — k variables (using the underlying optimizer) 

Set ChosenVariables in x to the values given by the previous step 

Update one-flip gains and V 

C,T TabuUpdateO 

if V < Vmin then 

Vinin ^ V 
-tmin ^ -T 

if Converged)) then 
X <— Escape)) 

Update one-flip gains and V 
return Vn,in,Xn,in 


each stage the best flip to 0 or 1 is chosen with a probability that depends on the best one-flip gains. Although 
this results in a diverse set of initial solutions, we found that starting with random configurations gave more 
diverse initial sets (for example, based on the average Hamming distance). 

We note that the randomized version by Merz and Katayama ll75l only works when the highest 1-flip 
gains for both the best 0-flip and the best 1-flip are positive (for example, if they are both negative, p can be 
negative). In addition, in that version the 1-flip gains are not updated after the initial random bit is flipped. 
We also note that our version is for minimization, not maximization. We present our corrected version of the 
pseudo-code; see Algorithm^ 


2.4 Different underlying optimizers 

The proposed GQS algorithm lends itself to being solved using any appropriate optimizer; for example, an 
exhaustive search, a 1-opt tabu search, Gurobi Optimizer, CPLEX, or a quantum annealer. The algorithm has 
been successfully verified using the D-Wave II quantum annealer. However, since annealing technology is at 
an early stage, limitations such as qubit connectivity, number of qubits, and noise and error levels restrict the 
size of problem that can be solved. For this reason, for our benchmarking we used a 1-opt tabu search, with 
a tabu tenure of 15-20 and convergence length of 10k, where k is the size of the underlying optimizer. We 
have verified that the GQS algorithm works when using the quantum annealer as the underlying optimizer. 


2.5 Escape strategies 

Once the local search phase converges, which consists of multiple calls to the underlying optimizer, the GQS 
attempts to “escape” by flipping some number of bits, and then performing another local search phase. If 
the solver converges to a different configuration, then the escape has succeeded in diversifying the search 
space of the algorithm. The most trivial example of an escape is to flip a fixed number of bits randomly. We 
describe some more sophisticated ways of escaping below. 
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Algorithm 3 Randomized Greedy Initial Guess 

1: procedure RandomizedGreedy 

2: C^{1. N} 

3: for 1 € {1, ...,W} do jc,-= j 

4: Calculate one-flip gains g,- for all i € {1, ...,W} 

5: Select k € {1,...,A^} and I € {0,1} randomly and set y* / 

6: C-s-C-W 

7: repeat 

8: Update one-flip gains g, for all i € C 

9: Find /:o with g^ = min,£cg/' and ki with g{ = min,£cg* 

10: ifgO <0 and g‘ <0 then 

11: else if g® >0 and gj > 0 then p <- 

12: else if g®^ > 0 and < 0 then p <— 0 

13 : else if g®^ < 0 and gj.^ > 0 then p t— 1 

14 : else if g®^ = 0 and gl^ = 0 then p <— 5 

15: if random(0, 1) < p then 

16: XjfQ i — 0: C i — C — {^ 0 } 

17: else 

18: Jtjtj ^— 1 i C i — C — {^ 1 } 

19: until C = {} 

20: return x 


Algorithm 4 Deterministic Greedy Initial Guess 


1: procedure DeterministicGreedy 

2: C^{1 . N) 

3: for 1 € {1, ...,W} do Y,-= j 

4: Calculate one-flip gains g,- for all i € {1, ...,W} 

5: repeat 

6: Find ko with g^ = min,£cg? and k^ with gj. = min.gcg,- 

7: ifg^„<gl, then 

8: -CjtQ 4— 0; C -f- C— {^o} 

9: else 

10: ^ 1;C<- C-{j(:i} 

11: Update one-flip gains g, for all i € C; 

12: until C={} 

13: returns 


2.5.7 Path relinking 

Path relinking refers to maintaining a collection of the best solutions to which the solver converged, also 
referred to as a “reference set”, and then escaping to solutions found by fusing two elite solutions that have 
not been fused before. This method is used in Wang et al. 11771 . utilizing a tabu 1-opt local search. 

Algorithm|5] “Path Relinking Escape”, is composed of two phases. In the first, we maintain the reference 
set given a new converged solution. If the converged solution already appears in the reference set, we discard 
it. Otherwise, if the reference set is not full we add the converged solution, and if the reference set is full we 
only add the solution if it is better than the worst solution in the reference set, which we then discard. In the 
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second phase, we perform the escape. If the reference set is full and unfused pairs remain with a sufficiently 
large Hamming distance, we choose an unfused pair randomly, and update the current solution to the fused 
“child”. If the reference set is not yet full, we escape to a random configuration. Once we run out of unfused 
pairs, we empty the reference set except for the best solution thus far found, and restart the procedure. 

There are different ways to fuse the parents. We choose to ignore pairs of parents that have a Hamming 
distance d less than a parent distance threshold (typically a small number such as 5). For parents with a 
Hamming distance greater than that, we identify the bits that are equal and the ones that are different in both 
parents. First, we set the equal bits in the child to be equal to those in the parents. Then we set the different 
bits in the child randomly, such that the resulting child’s Hamming distance to each of the parents is at least 
a user-defined fraction of the parent-parent Hamming distance (for example, 0.3). 


Algorithm 5 Path Relinking Escape 

1: procedure PathRelinkingEscape 
2: if X ^ ref_set then 

3: if ref_set not full then 

4: Add x to ref_set 

5: Update unfused_pairs 

6: else 

7: if U is better than worst in ref_set then 

8: Pop worst from ref_set 

9: Add x to ref_set 

10: Update unfused_pairs 

11: if ref_set is full then 

12: if unfused_pairs is not empty then 

13: X <— Fuse random pair from unfused_pairs 

14: else 

15: Empty ref_set except best 

16: Empty unfused_pairs 

17: X 4—RandomizedGreedyO 

18: else x <— RandomizedGreedyO 

19: return X 


2.5.2 f-smart method 

We keep track of which variables were flipped in recent improvements. Then, when an escape is needed (in 
order to diversify the search space), we randomly flip some number of the most-flipped bits. The reasoning 
behind the flip frequency-based scheme is that certain variables are unstable (defined as being flipped often 
when updating), and those are precisely the variables we should flip to escape (as opposed to the more-stable 
variables). We also track the number of escapes with no update to Emin, and increase the number of variables 
to flip as this number increases, making the flips increasingly drastic when less drastic flips have failed to 
diversify the search space. This is a version of the adaptive memory idea introduced by Glover et al. II78II79II . 


2.6 Choice of variables 

There are many ways to choose the k variables to solve for from amongst the possibilities. The simplest 
one is to choose randomly. However, for all problems we tested there is an advantage to choosing more 
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intelligently. For the benchmarked problems in this paper, a combination of gains-based and fusion-guided 
(see Sections [2.6.1l and l2.6.21 l choice of variables was the most advantageous. 


2.6.1 Gains-based 

In a 1-opt tabu search, the variable flipped is often the variable with the best one-flip gain. We generalize 
this such that we try to update the k variables with the best 1-opt gain. This method can get stuck easily if 
it is not used with tabu search for k-opt (see Section IZTb . since the k variables with the best k one-flip gains 
will be chosen repeatedly, even when no possible flip can be found that lowers the objective function’s value. 
Another possibility is to choose the k variables randomly, weighting the choice based on the one-flip gains. 


2.6.2 Fusion-guided 

This scheme is coupled with path relinking escapes (see Section 12.5.11) . Once a pair of parents has been 
chosen, we can focus our search on the hypercube defined by the symmetric space between these parents. 
Let d be the Hamming distance between the parents. Then, \fk<d, we choose k variables randomly amongst 
the d variables; otherwise, we choose all of the d variables and k — d more variables from the set of variables 
that were equal in the parents. We do this for w iterations, before reverting back to one of the other variable- 
choice methods: gains, coupling, random, etc. In addition, we note that the tabu search for k-opt is not taken 
into account for these w iterations: the tabu list is kept empty and the candidate list is kept full. 


2.6.3 Coupling-based 

Intuitively, we might expect it to be advantageous to choose strongly correlated variables to be considered 
for an update together. There are many ways to do this, and the tradeoff between precision and computational 
time must be considered. We present a simple example; see Algorithm|6] 

This algorithm iterates through two stages until finding k variables out of the candidate variables C. In 
the first stage, a variable is chosen randomly, using probabilities derived from the sum of absolute values of 
coefficients in that variable’s column in the problem matrix Q (we call this the “strength”). In the second 
stage, a variable that is connected (in the underlying adjacency matrix) to the variable in the first stage is 
chosen randomly, using probabilities derived from the absolute value of the coefficients in the first variable’s 
column (we call this the “conditional strengths”). 


2.6.4 Practical considerations for the quantum annealer 

The specifics of the graph of the D-Wave chip are also a consideration. For QUBO problems that are more 
dense than the hardware graph, it is necessary to identify multiple logical qubits with each physical qubit. 
The mapping between logical and physical qubits is often referred to as an “embedding’ ’17211231. 

However, for sparse problems of size N, any subproblems of size k are likely to be sparse as well. For 
this reason, in some scenarios it might be better not to use a complete embedding (forcing k < V^+ 1, 
where n is the number of qubits), since it might be possible to embed considerably larger subproblems than 
that (v^^-f 1 < k < n). The challenge is minimizing the time expense of finding an embedding for each 
subproblem. 

Non-complete embeddings can be utilized in at least two ways: 
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Algorithm 6 Coupling Choice Algorithm 

1: procedure CouplingChoice 
2 : indices_to_update ^ {} (an empty set) 

3: LC^C 

4: LQ ^ project QonC 

5: n ■<—dim(L0 

6 : for i = 1 to n do strength[i] ^ \^Q\ji 

7; probabilities[i] ^ strength[i] / sum(strength), for each i 

8 : while length(indices_to_update) <kdo 

9: p -(r- random(0,1) (first stage) 

10: i ^ C[index for interval in which p falls in cumulative probabilities] 

11: if i € LC then remove i from LC (exclude from second stage) 

12: Add i to indices_to_update 

13: if length(indices_to_update) = k then break 

14: for j € LC do conditional_strengths[ 7 ] ^ \LQji\ 

15: if ^(conditional_strengths) = 0 then continue 

16: Divide conditional_probabilities by ^(conditional_strengths) 

17: prandom(0,1) (second stage) 

18: m ^ LC[index for interval in which p falls in cumulative conditional_probabilities] 

19: Remove m from LC 

20: Add m to indices_to_update 

21 : return indices_to_update 


1. Find a new embedding before each call. The disadvantage to this method is the time required to find an 
embedding, and the fact that one cannot know for sure if a graph can be embedded without trying to 
embed it. It may be possible to devise a heuristic that could judge which size and density combinations 
are likely to be embeddable, but multiple calls to an embedding finder might still be necessary before an 
embedding is found and can be used for a call to the quantum annealer. 

2. Find a pool of subproblems and the corresponding embeddings in advance, and then choose one to use at 
each step. The advantage to this method is apparent for problems that share the same adjacency matrix, 
but the coefficients differ. One can then find a pool of subproblems and corresponding embeddings in 
advance, and at each stage choose one of the prechosen subproblems to optimize over. This method can 
be combined with the above methods; for example, after determining the k variables with the best one- 
flip gains, we can choose the subproblem from the pool that contains the largest number of variables 
from the group of k variables. 


2.7 Tabu search for k-opt 

We define a version of tabu search for k-opt: we maintain a queue of length TT (the tabu tenure), in which 
each element is a list of variables of length k. While the queue is not full we add a new group of k variables 
at each update. Once the queue is full, we begin to also pop the oldest list of k variables at each step. We 
also try a version of this in which only the variables that were actually flipped in the update are marked as 
tabu. In this case the elements in the queue are not of equal length, and indeed are sometimes empty (if no 
variable out of the k was flipped in the update). 

Unlike 1-opt tabu search, the objective of using the tabu tenure here is not to allow hill climbing, but 
rather to force the same variables to not be chosen for an update attempt too frequently. The reason for this 
is that when optimizing over the k variables, at worst the same solution will be chosen for those k variables 
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(based on the way our algorithm is built). Hill climbing occurs from within the underlying optimizer; hills 
of up to k flips can be climbed, in principle. Since most tabu search algorithms employ a tabu tenure of up 
to ^20, and typically k > 20, we consider this to be sufficient hill climbing capability. 


2.8 1-opt tabu search improvement 

We allow the option of 1-opt tabu search improvement phases immediately after an escape (before the A:-opt 
phase), after convergence (after the A:-opt phase), or with a probability p of occurring instead of a A;-opt update 
(disregarding the tabu queue of the A;-opt). We typically set the tabu tenure low (or to 0) and the convergence 
length short, as the idea is to get a quick improvement. We motivate this improvement by noting that despite 
k-opt digging “deeper” than 1-opt, the latter has an advantage in that it is able to consider all of the variables 
at once for an update. When N is much larger than k, each variable is visited less frequently by the GQS, and 
this advantage becomes increasingly important. We note that these optional tabu search 1-opt improvement 
phases are not indicated in the pseudo-code in the interest of clarity. 


2.9 Stopping criteria and convergence criterion 

For benchmarking purposes, we dehne stopping criteria as a maximum time limit or a maximum number 
of escapes to be performed. In addition, for some practical problems a known objective function’s value is 
sufficient, so we also allow stopping once a certain value is reached. 

For the convergence criterion, we dehne convergence as there being no update to the minimum value 
found since the last escape, within CL (convergence length) steps from the last escape. We also dehne an 
objective function’s tolerance tol. Only prospective updates that lower the objective function’s value by more 
than tol are accepted. 


3 Benchmarking 

3.1 Methodology 

For benchmarking purposes, we simulated larger quantum annealers in order to make a claim regarding the 
scaling of the solution time and/or gap (that is, the difference in the objective function’s value from the best 
known solution). To do this, we replaced D-Wave’s machine’s being the underlying optimizer with a 1-opt 
tabu search with short-term memory and aspiration (see, for example, Il79l ). and then replaced the time the 1- 
opt tabu search takes with the expected time it would take the D-Wave solver to solve that problem (we chose 
0.02 second^. This allowed us to simulate a D-Wave machine that is many years away (at a signihcant cost 
in time). We have verified that the algorithm works with the D-Wave machine as the underlying optimizer, 
but due to delays as explained in Section there are practical reasons why it is preferable to simulate the 
quantum annealer for our benchmarking. 

The actual time it will take a new D-Wave machine to solve a problem of size k in the future is unknown, 
and it may be more or less than the number we used. This is due to advances in chip technology, and also 
depends on the way in which the machine will be used: it is possible to use a small number of reads, a 

^ We chose 0.02 seconds based on 1000 anneals each taking 20 microseconds (which is the current minimum anneal time). This is 
the actual time for computation, and does not include extra time for programming the chip, thermalization, etc. Since future run times 
are not known, this number is only meant as a rough estimate, to give an idea of the actual time the computation could take. 
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short annealing time, and a short wait before readout to lower the time required. It is also possible that as 
the number of qubits grows, a longer annealing time may be required in order to maintain the quality of 
results (this is expected once quantum annealers reach the threshold where if they were to anneal any faster 
the results would become worse). For these reasons, we also report the number of iterations (calls to the 
D-Wave machine) which would remain accurate regardless of future run time. The total time for each call is 
composed of a “quantum time”, which is the product of the number of iterations and the presumed run time 
(see above), and a “classical time”, including all other solver time before and after the underlying optimizer 
calls. 

For each problem set and each parameter set, we solved each problem 32 times with a time out of 90 
seconds of wall-clock time for the problems of size 500, 2250 seconds for the problems of size 2500, and 
1.2A^ seconds for the problems of size 3000 and above. We report the time to best solution, the average 
relative gap between the best solution found compared to the best known solution, the number of underlying 
optimizer iterations to the best solution, and the success rate (the percentage of repetitions that reached the 
best known solution). 

We benchmarked the GQS on three sets of problems. The first is the OR-Library unconstrained binary 
quadratic programming problems Il80l . We generated these problems (labelled “bqp” below) by randomly 
choosing integers between —100 and +100 from a uniform distribution such that the resulting problem has 
a density of 0.1 (that is, only 10% of the elements are non-zero). They have been widely benchmarked in the 
combinatorial optimization literature. We used the best known solutions from Tavares ED. 

The second set of problems, random fully dense QUBO problems (labelled “RFDQ” below) was gen¬ 
erated in the same way, but they have a density of 1. We obtained the best known solutions by running a 
multi-start 1-opt tabu search solver with a large number of iterations, as well as performing a second check 
by running the path relinking algorithm of Wang et al. 11771 . The third set of problems was generated using 
the problem generator and settings written by Palubecki^ ll82l with densities ranging from 0.5 to 1, and we 
used the best known solutions reported in Wang et al. lITTI . 

All benchmarking was performed on Amazon Web Services using a c3.8xlarge instance, which has 32 
virtual CPUs running at 2800 MHz. Each call to the GQS was run serially on a single CPU. The code was 
written in Python, utilizing optimized external libraries such as Numpy and Scipy. An efficient implementa¬ 
tion written in a lower-level language such as C would give faster times. However, we expect our results to 
still hold qualitatively, and much of the prospective time advantage of a lower-level programming language 
is cancelled by our method of replacing the true run time of the underlying optimizer with the assumed run 
time of the D-Wave annealer (the number of iterations would remain the same). 


3.2 Parameters 

In the results below, we set the parameters that were not varied as listed in Table[D The settings were chosen 
empirically from problems of sizes up to 500 from OR-Library as well as prior literature (such as lITTl ). The 
names and descriptions of the parameters that were varied are in Table |2] 

In the case of a quantum annealer, k refers to the largest complete problem that can be solved with that 
quantum annealer. Depending on the physical connectivity of the chip, the number of qubits in the quantum 
annealer n could be much larger (see Section fL^ . 


^ The source code of the generator and the input files to create these problems can be found at 
http; //www.proin.ktu. lt/”gintaras/ubqop_its .html This page was last retrieved on July 21, 2015. 
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Table 1 Benchmarking parameters (constant) 


Name 

Value 

Description 

optimizer_TT 

optimizer_CL 

parent_distance_thre shold 

child_distance_threshold 

num_elite_solutions 

variable_choice 

presumed_time 

tol 

15-20 

m 

5 

0.33 

10 

“gains” 
0.02 sec 

lo-** 

tabu tenure for the 1-opt tabu search underlying optimizer 
maximum iterations with no improvement for the 1-opt tabu search 
underlying optimizer 

minimum Hamming parent-parent distance 
minimum Hamming parent-child distance 
number of elite solutions to keep 
variable choice type after w iterations (see below) 
presumed run time for quantum annealer 
minimum change in value for an update 


Table 2 Benchmarking parameters (varying) 


Name 

Description 

k 

variable group size for underlying optimizer 

CL 

maximum calls to the underlying optimizer with no improvement 

W 

number of fusion-guided iterations 

TT 

^-opt tabu tenure, typically oc N/k 

W 

tabu whole group (True/False) 


3.3 Results 

3.3.1 OR-Library and RFDQ: 500 to 2500 variables 

We present results for the OR-Library 500 and 2500 problems and RFDQ problems of size 500 below. 
We benchmarked the smaller instances comprehensively to choose the best settings, optimizing for highest 
success rate, which we then used for all remaining benchmarks. These runs were based on the path relinking 
escapes, with an initial reference set initialized from random, and updated using an underlying optimizer of 
size 50 with a convergence length of 3 and a k-opt tabu tenure of 6 (leading us to use TT = Q.6N/k hereafter), 
marking as tabu all variables chosen at each update (W = True). The variables chosen to be updated were 
based on the fusion-guided approach with w = 1, and based on the best one-flip gains thereafter (and during 
the creation of the reference set). The underlying optimizer was 1-opt tabu search with a tabu tenure of 15 
and convergence length 500. We verified that the algorithm is not very sensitive to the exact parameter values 
chosen: the differences are small, and are close to the statistical variation of the results. 

Table |3] includes the results for each problem set, averaged over the 10 problems in each set, as well as 
32 repetitions for each problem. Detailed results for each problem (averaged over the 32 repetitions) in the 
OR-Library and RFDQ problem sets are presented in Table|4]and Tabled respectively. 


Table 3 Results for size 500 for k — 50, CL — 3, TT — 64, w — I, fusion-guided path relinking with W — True. The average time 
to best solution in seconds (assuming an underlying optimizer call time of 0.02 seconds) is denoted <T>, the average gap to the best 
known solution as a percentage is denoted <G>, the fraction of GQS calls that resulted in finding the best known solution is denoted 
“succ.”, and the average number of calls to the underlying optimizer is denoted </>. 


problem set 

<T> 

STD(r) 

<G> 

STD(G) 

succ. 

<I> 

STD(/) 

bqp500 

4.83 

2.60 

0.02 

0.06 

60.62 

158.3 

91.2 

RFDQ500 

2.98 

2.50 

0.01 

0.04 

78.75 

95.9 

87.8 
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Table 4 Detailed results for OR-Library 500 for k — 50, CL — 3, TT — 6,w— 1, fusion-guided path relinking with W — True. See the 
caption of Table[3]for a definition of <T>, <G>, “succ.”, and </>. 


name 

<T> 

STDtr) 

<G> 

STD(G) 

SUCC. 

<I> 

STD(/) 

bqp500-l 

5.85 

2.51 

0.10 

0.13 

50.00 

197.3 

87.9 

bqp500-2 

3.74 

2.54 

0.00 

0.01 

90.62 

118.3 

88.3 

bqp500-3 

1.55 

1.00 

0.00 

0.00 

100.00 

45.8 

33.7 

bqp500-4 

5.93 

1.89 

0.01 

0.01 

34.38 

188.2 

67.1 

bqp500-5 

3.52 

1.82 

0.00 

0.02 

96.88 

113.7 

64.1 

bqp500-6 

6.25 

1.80 

0.04 

0.04 

46.88 

206.3 

64.2 

bqp500-7 

6.35 

2.10 

0.02 

0.02 

37.50 

209.4 

75.6 

bqp500-8 

6.34 

2.10 

0.04 

0.02 

12.50 

211.5 

76.7 

bqp500-9 

5.96 

1.99 

0.02 

0.04 

40.62 

204.9 

71.1 

bqp500-10 

2.85 

1.75 

0.01 

0.06 

96.88 

87.2 

58.7 


Table 5 Detailed results for RFDQ 500 for k — 50, CL — 3, TT = 8, w = 1, fusion-guided path relinking with W — True. See the 
caption of Table[3]for a definition of <T>, <G>, “succ.”, and </>. 


name 

<T> 

STD(r) 

<G> 

STD(G) 

SUCC. 

<I> 

STD(/) 

RFDQ500-1 

3.91 

2.59 

0.08 

0.03 

12.50 

129.7 

91.1 

RFDQ500-2 

4.71 

2.94 

0.01 

0.01 

62.50 

156.8 

104.5 

RFDQ500-3 

1.25 

0.77 

0.00 

0.00 

100.00 

36.2 

24.1 

RFDQ500-4 

3.74 

1.95 

0.03 

0.06 

75.00 

121.7 

67.2 

RFDQ500-5 

1.52 

1.01 

0.00 

0.00 

100.00 

44.7 

33.1 

RFDQ500-6 

5.03 

2.69 

0.02 

0.05 

84.38 

173.2 

95.3 

RFDQ500-7 

3.07 

2.62 

0.01 

0.01 

68.75 

95.5 

89.1 

RFDQ500-8 

1.26 

0.76 

0.00 

0.00 

100.00 

34.6 

23.2 

RFDQ500-9 

3.94 

2.43 

0.01 

0.01 

84.38 

128.7 

86.2 

RFDQ500-10 

1.32 

0.86 

0.00 

0.00 

100.00 

37.8 

28.2 


Table 6 Results for size 2500 for k — 50, CL — 3, TT — 30, w — fusion-guided path relinking with W — True. See the caption of 
Table[3]for a definition of <T>, <G>, “succ.”, and </>. 


identifier 

<T> 

STDCT) 

<G> 

STD(G) 

SUCC. 

</> 

STD(/) 

bqp2500 

173.07 

90.76 

0.02 

0.02 

34.06 

3798.0 

2008.6 


In addition. Table |6] includes the results for OR-Library 2500, averaged over the 10 problems in each 
set, as well as 32 repetitions for each problem. Detailed results for each problem (averaged over the 32 
repetitions) in the OR-Library problem sets are presented in Table [T] 

3.3.2 Palubeckis: 3000 to 7000 variables 

Results for the Palubeckis instances of sizes 3000-7000 are presented in Table[8] 

3.3.3 Summary of all results: 500 to 7000 variables 

We summarize all of the results for average time and average gap in Fig.[T] 

3.3.4 Benchmarking for practical requirements 

For practical use, we expect that for many use cases, once a solution has been found that is close enough 
to the global optimum, one can stop the search based on stopping criteria, even if the confidence in having 
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Table 7 Detailed results for RFDQ 500 for k = 50, CL = 3, TT = 30, w = 1, fusion-guided path relinking with W = True. See the 
caption of Table[^for a definition of <T>, <G>, “succ.”, and <1>. 


name 

<T> 

STD(r) 

<G> 

STD(G) 

SUCC. 

<I> 

STD(/) 

bqp2500-l 

180.63 

82.01 

0.01 

0.01 

34.38 

3966.6 

1804.7 

bqp2500-2 

197.50 

95.32 

0.05 

0.02 

0.00 

4381.8 

2091.1 

bqp2500-3 

190.15 

74.70 

0.03 

0.02 

9.38 

4240.1 

1679.0 

bqp2500-4 

1131 

49.22 

0.00 

0.00 

100.00 

1634.7 

1070.9 

bqp2500-5 

140.69 

87.81 

0.00 

0.00 

62.50 

3033.3 

1915.4 

bqp2500-6 

198.85 

80.40 

0.01 

0.01 

25.00 

4334.7 

1761.6 

bqp2500-7 

245.32 

67.62 

0.03 

0.02 

18.75 

5387.8 

1472.5 

bqp2500-8 

170.00 

85.64 

0.01 

0.01 

46.88 

3716.1 

1866.9 

bqp2500-9 

148.07 

83.10 

0.01 

0.01 

43.75 

3241.9 

1854.1 

bqp2500-10 

182.20 

88.32 

0.05 

0.02 

0.00 

4043.4 

1965.8 


Table 8 Detailed results for Palubeckis instances of size 3000-7000 for k = 50, CL = 3, w = 1, fusion-guided path relinking with W = 
True. See the caption of Table[3]for a definition of <T>, <G>, “succ.”, and </>. 


name 

<T> 

STD(7') 

<G> 

STD(G) 

SUCC. 

<I> 

STD(/) 

p3000-l 

292.89 

135.37 

0.03 

0.03 

40.62 

5967.0 

2770.8 

p3000-2 

280.17 

124.55 

0.01 

0.01 

18.75 

5614.8 

2486.6 

p3000-3 

313.99 

153.08 

0.03 

0.02 

15.62 

63613 

3080.9 

p3000-4 

368.17 

148.97 

0.04 

0.01 

3.12 

7423.2 

2978.1 

p3000-5 

323.24 

142.11 

0.03 

0.01 

0.00 

6493.5 

2843.6 

p3000 

315.69 

144.39 

0.03 

0.02 

15.62 

6373.2 

2903.9 

p4000-l 

599.18 

143.46 

0.02 

0.03 

43.75 

10431.8 

2481.5 

p4000-2 

466.75 

187.62 

0.05 

0.02 

6.25 

8018.7 

3202.9 

p4000-3 

544.72 

151.48 

0.04 

0.02 

0.00 

9430.6 

2619.2 

p4000-4 

403.19 

210.82 

0.02 

0.02 

40.62 

6996.3 

3650.3 

p4000-5 

564.33 

184.65 

0.06 

0.02 

0.00 

9680.3 

3147.6 

p4000 

515.63 

191.05 

0.04 

0.03 

18.12 

8911.5 

3290.8 

p5000-l 

773.82 

276.58 

0.06 

0.03 

0.00 

11708.2 

4163.0 

p5000-2 

614.80 

263.88 

0.04 

0.02 

0.00 

9254.3 

3917.7 

p5000-3 

623.26 

251.04 

0.07 

0.03 

3.12 

9459.2 

3739.8 

p5000-4 

783.66 

228.80 

0.07 

0.03 

0.00 

11714.3 

3402.6 

p5000-5 

774.58 

211.93 

0.05 

0.02 

0.00 

11426.0 

3088.7 

p5000 

714.02 

259.46 

0.06 

0.03 

0.62 

10712.4 

3846.7 

p6000-l 

1018.34 

302.98 

0.08 

0.04 

0.00 

13584.2 

3974.2 

p6000-2 

944.05 

246.74 

0.06 

0.03 

0.00 

12556.8 

3255.0 

p6000-3 

955.48 

293.04 

0.06 

0.03 

0.00 

12813.8 

3909.6 

p6000 

972.62 

283.87 

0.07 

0.03 

0.00 

12985.0 

3752.6 

p7000-l 

1215.57 

362.55 

0.06 

0.02 

0.00 

14539.8 

4285.1 

p7000-2 

1209.88 

371.84 

0.07 

0.02 

0.00 

14574.3 

4373.1 

p7000-3 

1179.84 

337.17 

0.09 

0.02 

0.00 

14225.0 

3940.6 

p7000 

1201.76 

357.83 

0.07 

0.02 

0.00 

14446.4 

4206.7 


successfully found the global optimum is not high. The user can define how close is sufficient. With this in 
mind, we can run the solver such that it stops once we find a solution with a given gap to the best known 
solution. In practice, the average time (and distribution of times) necessary to do this for a given set of 
problems could then be used as a guideline for the time it would require to attain that level of quality of 
solutions to new problems (assuming they are of similar difficulty). 

As an example, we consider that for many applications, once a local optimum has been found with a gap 
of, for example, 0.1%, there is little benefit in searching any further. One reason for this is that the parameters 
of the problem often have considerable uncertainties associated with them, leading to an uncertainty in the 
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Fig. 1 (a) Average time as a function of the problem size. Data was collected for GQS with parameters k — 50, CL = 3, w = 1, 
TT — 0.6N/k, fusion-guided path relinking with W — True, averaged over the respective problems and 32 repetitions per problem, (b) 
Average gap as a function of problem size. Note for both figures: the results for OR-Library 500 were moved slightly to the right for 
clarity. 


objective function’s value that is larger than the gap itself. In other applications, such as machine learning, it 
is undesirable to “over-fit”, by finding the very best fit of the model to the training data, because it will not 
generalize to the next set of data, and for this reason optimizers will often intentionally be stopped early. 

We investigated how the time required varied as we change the desired gap. Fig. |2]provides results for 
the OR-Library and RFDQ problems of size 500. For these problems, we found that the time required to find 
a solution that is, for instance, 1% from the best known solution is low. In addition, as we require finding a 
solution closer to the best known solution, the time rises rapidly, as does the variation in the time required to 
find said solution, for both types of problems. 




Fig. 2 (a) Average time as a function of the required gap. Data was collected for GQS with parameters k — 50, CL — 3, TT = 8, w = 1, 
fusion-guided path relinking with W — True, for OR-Libraiy problems of size 500, averaged over the ten problems and 32 repetitions, 
(b) The same for RFDQ problems of size 500. 
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3.3.5 Dependence on size of underlying optimizer 

We investigated the dependence of the average time to best solution on the underlying optimizer size; see 
Fig. [3a). We found that the time to best solution falls exponentially, as can be seen by observing that the time 
to best solution falls linearly as a function of the logarithm of the underlying optimizer size; see Fig.jjb). 
The expectation is that as k ^ N, the number of iterations required should go to 1, giving a time equal 
to the presumed time for one call to the underlying optimizer (in this case 0.02 seconds). The exponential 
decrease can be understood by observing that as the underlying optimizer increases in size, it is able to scan 
an exponentially increasing variable space in constant time (based on our assumption of constant time). 




Simulated D-Wave mini-solver size 


Simulated D-Wave mini-solver size 


Fig. 3 (a) Average time to best solution as a function of k, the simulated D-Wave chip size. Data was collected for GQS with parameters 
CL = 3, w = 1, fusion-guided path relinking with W — True for OR-Library problems of size 2500, averaged over the ten problems and 
32 repetitions, and TT was chosen such that the tabu list is 60% of the problem size, (b) Log of the average time to best solution as a 
function of the underlying optimizer size (same data as (a)). 


We also investigated the dependence of the average gap obtained on the underlying optimizer size; see 
Fig. |4| We see that the gap decreases slowly. Intuitively, as the underlying optimizer size is increased, the 
GQS considers flipping larger and larger subgroups of bits, which contain within them the bit flips for any 
smaller underlying optimizers. Hence, we expect the gap to stay constant at worst, and at best to improve. 


4 Discussion 

We have also tried solving the two sets of problems bqp500 and RFDQ500 with a 1-opt tabu search with 
a tabu tenure of 20 and a convergence length of 2500 with 100 repetitions per problem (see Table O. The 
results show that the time to best solution is only slightly higher for the dense problems (recall that the 
OR-Library problems have a density of 0.1), but the success rate is much higher and the gap is lower. Based 
on this, finding a good local optimum appears to be similarly hard in both cases (for tabu 1-opt search), but 
finding the global optimum is harder in the sparse problems. For most practical applications, we expect the 
difference between the gaps (0.02% and < 0.005%) to be negligible, so for practical purposes it appears that 
these two sets of problems are of similar difficulty. 

We note that one might have expected that the dense problems would be harder, but they do not appear to 
be. To explain this, we hypothesize that sparse random problems contain many more local minima (relative 



















Building an iterative heuristic solver for a quantum annealer 


17 



Simulated D-Wave mini-solver size 

Fig. 4 Average gap as a function of the simulated D-Wave chip size, with the same parameters as in Fig.|^ 


Table 9 1-opt tabu results. See the caption of Table|3]for a definition of <T>, <G>, and “succ”. The average number of one-bit flips is 
denoted <I>. 


set 

<T> 

STD(r) 

<G> 

STD{G) 

SUCC. 

<I> 

STD(/) 

bqpSOO 

0.30 

0.19 

0.02 

0.02 

52.00 

1347 

974 

RFDQ500 

0.38 

0.30 

0.00 

0.01 

95.00 

1837 

1596 


to the one-flip neighbourhood), and hence the optimizer can more easily get stuck in a local minimum and 
miss the global optimum. A possible explanation for this is that given a solution and its objective function’s 
value, in a sparse problem the probability of flipping some number of bits and ending up at a similar objective 
function’s value is much higher than in a dense problem. It follows that the probability of the existence of 
multiple local minima with a similar value of the objective function is much greater. 

Comparing the results of GQS in Table [3] (where k = 50) with the results for the 1-opt tabu search in 
Table |9] we note first that the GQS results are marginally better for the OR-Library problems (that is, they 
have a higher success rate) and marginally worse for the RFDQ problems. Secondly, we note that the time 
to best solution is considerably longer for the GQS for both sets of problems. 

As we increase the underlying optimizer’s size, we expect the average time to best solution to drop and 
the quality of the solution (that is, the average gap) to either decrease or remain constant. Based on this 
expectation, we predict that when the underlying optimizer’s (the D-Wave chip) size is large enough, it will 
eventually beat 1-opt tabu search and all other classical solvers (see Section [3.3.51 l. Exactly when this occurs 
will depend on the quality and run time of the D-Wave processor. 


5 Conclusions and Future Work 

We have shown that it is possible to use the D-Wave machine to heuristically solve significantly larger 
problems than the chip intrinsically allows. If in the future the D-Wave machine is able to optimize QUBO 
problems faster than the best classical algorithms, this contribution could make the machine more suitable 
for solving real-world problems earlier. Based on our approach, we found that sparse and dense random 
problems of size 500 can be optimized using a k = 50 simulated quantum annealer to within 0.02% of the 
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optimum in 100-160 iterations, assuming the quantum annealer’s solution quality is similar to 1-opt tabu 
search. If we assume that the run time is 0.02 seconds, solving these problems requires 3-5 seconds. Keeping 
the solver size constant, for larger problem sizes, the number of iterations increases rapidly and the quality 
of the produced solutions degrades, although it remains within 0.07% for all problems, even for the dense 
problems of size 7000 (and would likely reduce further if we increased the time out). We would expect these 
results to improve when using a better underlying optimizer (that is, one with a higher success rate). 

Although the time to best solution for a D-Wave chip with a complete graph of size k = 50 is not com¬ 
petitive, we have shown that as k grows, the time required to solve a given problem drops exponentially, 
assuming constant call time to quantum annealer. Based on this assumption, and assuming quantum anneal¬ 
ers scale faster than classical algorithms on classical hardware, we would expect that the D-Wave chip will 
eventually beat 1-opt tabu search and all other classical solvers. Exactly when this will occur will depend on 
the quality and run time of the processor. 

Our estimate of the time that will be required in the future for a D-Wave machine to solve different 
problems has an unknowable precision since it depends on future advances in engineering which could 
change the run time of the machine. Similarly, future advances in error correction Il83l485ll and expected 
decreases in intrinsic error will affect the quality of the quantum annealer’s results. To the extent that this 
quality is better than the underlying optimizer we used to simulate the D-Wave machine, the actual D-Wave 
machine’s solving time could decrease faster than estimated. 

One way in which our algorithm could be improved is by implementing preprocessing. There are many 
ways in which QUBO problems can be preprocessed to decrease the size of the problem (several are re¬ 
viewed in HI]). In addition, using a better underlying optimizer such as the path relinking algorithm would 
undoubtedly improve the results llTTlI . Finally, we look forward to improvements in the D-Wave machine 
in the coming years, at which point we hope to see competitive results that use the machine itself as the 
underlying optimizer (and possibly using the embedding ideas from Section l2.6.4b . 
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